Brief Announcement: Parallel Depth First vs. Work Stealing Schedulers on CMP Architectures
نویسندگان
چکیده
1. ABSTRACT In chip multiprocessors (CMPs), limiting the number of off-chip cache misses is crucial for good performance. Many multithreaded programs provide opportunities for constructive cache sharing, in which concurrently scheduled threads share a largely overlapping working set. In this brief announcement, we highlight our ongoing study [4] comparing the performance of two schedulers designed for fine-grained multithreaded programs: Parallel Depth First (PDF) [2], which is designed for constructive sharing, and Work Stealing (WS) [3], which takes a more traditional approach.
منابع مشابه
Near Optimal Work-Stealing Tree Scheduler for Highly Irregular Data-Parallel Workloads
We present a work-stealing algorithm for runtime scheduling of dataparallel operations in the context of shared-memory architectures on data sets with highly-irregular workloads that are not known a priori to the scheduler. This scheduler can parallelize loops and operations expressible with a parallel reduce or a parallel scan. The scheduler is based on the work-stealing tree data structure, w...
متن کاملCompiler Support for Work-Stealing Parallel Runtime Systems
Multiple programming models are emerging to address an increased need for dynamic task parallelism in multicore shared-memory multiprocessors. Examples include OpenMP 3.0, Java Concurrency Utilities, Microsoft Task Parallel Library, Intel Threading Building Blocks, Cilk, X10, Chapel, and Fortress. Scheduling algorithms based on work-stealing, as embodied in Cilk’s implementation of dynamic spaw...
متن کاملConfidence-Based Work Stealing in Parallel Constraint Programming
The most popular architecture for parallel search is work stealing: threads that have run out of work (nodes to be searched) steal from threads that still have work. Work stealing not only allows for dynamic load balancing, but also determines which parts of the search tree are searched next. Thus the place from where work is stolen has a dramatic effect on the efficiency of a parallel search a...
متن کاملProgram-Centric Cost Models for Locality and Parallelism
Good locality is critical for the scalability of parallel computations. Many cost models that quantify locality and parallelism of a computation with respect to specific machine models have been proposed. A significant drawback of these machinecentric cost models is their lack of portability. Since the design and analysis of good algorithms in most machine-centric cost models is a non-trivial t...
متن کاملThread Scheduling For Shared Caches ECE 742 Final Project Report
Simultaneous multithreading (SMT) processors and chip multiprocessors (CMP) with shared caches usually require a primary cache increase by a factor proportional to the number of execution contexts to retain the cache performance of the uniprocessor. In this paper we study depth-first task scheduling, which was recently shown to reduce the number of cache misses when a single multithreaded appli...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007